1 |
Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrieval ...
|
|
|
|
Abstract:
Pretrained multilingual text encoders based on neural Transformer architectures, such as multilingual BERT (mBERT) and XLM, have achieved strong performance on a myriad of language understanding tasks. Consequently, they have been adopted as a go-to paradigm for multilingual and cross-lingual representation learning and transfer, rendering cross-lingual word embeddings (CLWEs) effectively obsolete. However, questions remain to which extent this finding generalizes 1) to unsupervised settings and 2) for ad-hoc cross-lingual IR (CLIR) tasks. Therefore, in this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a large number of language pairs. In contrast to supervised language understanding, our results indicate that for unsupervised document-level CLIR -- a setup with no relevance judgments for IR-specific fine-tuning -- pretrained encoders fail to significantly outperform models ...
|
|
URL: https://www.repository.cam.ac.uk/handle/1810/327496 https://dx.doi.org/10.17863/cam.74949
|
|
BASE
|
|
Hide details
|
|
2 |
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Cross-lingual semantic specialization via lexical relation induction ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Adversarial propagation and zero-shot cross-lingual transfer of word vector specialization ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Do we really need fully unsupervised cross-lingual embeddings? ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
On the relation between linguistic typology and (limitations of) multilingual language modeling ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Cross-lingual semantic specialization via lexical relation induction
|
|
Ponti, Edoardo; Vulić, I; Glavaš, G. - : EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 2020
|
|
BASE
|
|
Show details
|
|
9 |
On the relation between linguistic typology and (limitations of) multilingual language modeling
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Adversarial propagation and zero-shot cross-lingual transfer of word vector specialization
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Do we really need fully unsupervised cross-lingual embeddings?
|
|
Vulić, I; Glavaš, G; Reichart, R. - : EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 2020
|
|
BASE
|
|
Show details
|
|
12 |
Towards zero-shot language modeling
|
|
Ponti, Edoardo; Vulić, I; Cotterell, R. - : EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 2020
|
|
BASE
|
|
Show details
|
|
14 |
Zero-shot language transfer for cross-lingual sentence retrieval using bidirectional attention model ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Learning unsupervised multilingual word embeddings with incremental multilingual hubs ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Specializing distributional vectors of allwords for lexical entailment ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Investigating cross-lingual alignment methods for contextualized embeddings with Token-level evaluation ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Specializing distributional vectors of allwords for lexical entailment
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Investigating cross-lingual alignment methods for contextualized embeddings with Token-level evaluation
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Learning unsupervised multilingual word embeddings with incremental multilingual hubs
|
|
Heyman, G; Verreet, B; Vulić, I. - : NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 2019
|
|
BASE
|
|
Show details
|
|
|
|